pretrained weight
- Health & Medicine > Therapeutic Area (0.98)
- Health & Medicine > Diagnostic Medicine > Imaging (0.70)
Intelligent Communication Mixture-of-Experts Boosted-Medical Image Segmentation Foundation Model
Zhang, Xinwei, Chen, Hu, Yuan, Zhe, Tian, Sukun, Feng, Peng
Foundation models for medical image segmentation have achieved remarkable performance. Adaptive fine-tuning of natural image segmentation foundation models is crucial for medical image segmentation tasks. However, some limitations exist in existing fine-tuning methods: 1) insufficient representation of high-level features and 2) the fine-tuning process disrupts the structural integrity of pretrained weights. Inspired by these critical problems, we propose an intelligent communication mixture-of-experts boosted-medical image segmentation foundation model, named IC-MoE, with twofold ideas: 1) We construct basic experts, semantic experts, and adaptive experts. Moreover, we implement a pixel probability adaptive voting strategy, which enables expert selection and fusion through label consistency and load balancing. This approach preliminarily enhances the representation capability of high-level features while preserving the structural integrity of pretrained weights. 2) We propose a semantic-guided contrastive learning method to address the issue of weak supervision in contrastive learning. This method further enhances the representation capability of high-level features while preserving the structural integrity of pretrained weights. Extensive experiments across three public medical image segmentation datasets demonstrate that the IC-MoE outperforms other SOTA models. Consequently, the proposed IC-MoE effectively supplements foundational medical image segmentation models with high-level features and pretrained structural integrity. We also validate the superior generalizability of the IC-MoE across diverse medical image segmentation scenarios.
Guess-and-Learn (G&L): Measuring the Cumulative Error Cost of Cold-Start Adaptation
Evaluation of machine learning models typically emphasizes final accuracy, overlooking the cost of adaptation: the cumulative errors incurred while learning from scratch. Guess-and- Learn (G&L) v1.0 addresses this gap by measuring cold-start adaptability - the total mistakes a model makes while sequentially labeling an unlabeled dataset. At each step, the learner selects an instance, predicts its label, receives the ground truth, and updates parameters under either online (per-sample) or batch (delayed) mode. The resulting error trajectory exposes adaptation speed, selection quality, and bias - dynamics invisible to endpoint metrics. G&L defines four tracks (Scratch/Pretrained $\times$ Online/Batch) to disentangle the effects of initialization and update frequency. We formalize the protocol, relate it to classical mistake-bound theory, and estimate a heuristic "oracle reference band" for MNIST as a plausibility reference. Baseline experiments on MNIST and AG News, spanning classical methods (Perceptron, k-NN), convolutional architectures (CNN, ResNet-50), and pretrained transformers (ViT-B/16, BERT-base), reveal systematic differences in early-phase efficiency: smaller models can adapt with fewer initial errors, while pretraining benefits vary by domain. Across settings, current models remain well above the oracle band, highlighting an adaptability gap. By quantifying the mistake cost of early learning, G&L complements conventional benchmarks and provides a reproducible framework for developing learners that are not only accurate in the limit but also reliable from the first examples.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States (0.28)
- North America > Canada (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- North America > United States > Texas > El Paso County > El Paso (0.04)
- North America > United States > Colorado (0.04)
Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics
Li, Shiwei, Luo, Xiandi, Tang, Xing, Wang, Haozhao, Chen, Hao, Luo, Weihong, Li, Yuhua, He, Xiuqiang, Li, Ruixuan
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method. In standard LoRA layers, one of the matrices, $A$ or $B$, is initialized to zero, ensuring that fine-tuning starts from the pretrained model. However, there is no theoretical support for this practice. In this paper, we investigate the impact of non-zero initialization on LoRA's fine-tuning dynamics from an infinite-width perspective. Our analysis reveals that, compared to zero initialization, simultaneously initializing $A$ and $B$ to non-zero values improves LoRA's robustness to suboptimal learning rates, particularly smaller ones. Further analysis indicates that although the non-zero initialization of $AB$ introduces random noise into the pretrained weight, it generally does not affect fine-tuning performance. In other words, fine-tuning does not need to strictly start from the pretrained model. The validity of our findings is confirmed through extensive experiments across various models and datasets. The code is available at https://github.com/Leopold1423/non_zero_lora-icml25.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada (0.04)
- (2 more...)
SRLoRA: Subspace Recomposition in Low-Rank Adaptation via Importance-Based Fusion and Reinitialization
Yang, Haodong, Wang, Lei, Hossain, Md Zakir
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method that injects two trainable low-rank matrices (A and B) into frozen pretrained models. While efficient, LoRA constrains updates to a fixed low-rank subspace (Delta W = BA), which can limit representational capacity and hinder downstream performance. We introduce Subspace Recomposition in Low-Rank Adaptation (SRLoRA) via importance-based fusion and reinitialization, a novel approach that enhances LoRA's expressiveness without compromising its lightweight structure. SRLoRA assigns importance scores to each LoRA pair (a column of B and the corresponding row of A), and dynamically recomposes the subspace during training. Less important pairs are fused into the frozen backbone, freeing capacity to reinitialize new pairs along unused principal directions derived from the pretrained weight's singular value decomposition. This mechanism enables continual subspace refreshment and richer adaptation over time, without increasing the number of trainable parameters. We evaluate SRLoRA on both language and vision tasks, including the GLUE benchmark and various image classification datasets. SRLoRA consistently achieves faster convergence and improved accuracy over standard LoRA, demonstrating its generality, efficiency, and potential for broader PEFT applications.
Decoupling Angles and Strength in Low-rank Adaptation
Bini, Massimo, Girrbach, Leander, Akata, Zeynep
Parameter-Efficient FineTuning (PEFT) methods have recently gained significant popularity thanks to the widespread availability of large-scale pretrained models. These methods allow for quick adaptation to downstream tasks with minimal computational cost. However, popular finetuning methods such as LoRA exhibit limited robustness when it comes to hyperparameter choices or extended training regimes, preventing optimal out-of-the-box performance. In contrast, bounded approaches, such as ETHER, provide greater robustness but are limited to extremely low-rank adaptations and fixed-strength transformations, reducing their adaptation expressive power. In this work, we propose Decoupled Lowrank Adaptation (DeLoRA), a novel finetuning method that normalizes and scales learnable low-rank matrices. Through evaluations on subject-driven image generation, natural language understanding, and instruction tuning, we show that DeLoRA matches or surpasses performance of competing PEFT methods, while exhibiting stronger robustness. The rapid advancement of deep learning has led to the development of large-scale pretrained models in various domains, especially in computer vision and natural language processing (Touvron et al., 2023a;b; Radford et al., 2021; Rombach et al., 2022). However, the enormous size of these models, reaching billions of parameters, presents significant challenges when adapting them to specific downstream tasks, particularly in terms of computational cost and efficiency. To address these challenges, Parameter Efficient FineTuning (PEFT) methods have emerged. PEFT methods are characterized by their introduction of a small set of learnable parameters, in contrast to the extensive parameter updates required in full finetuning. Notable examples include adapters (Houlsby et al., 2019) and prompt tuning (Lester et al., 2021).
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)